提取复杂刺激的潜在来源对于理解世界至关重要。尽管大脑不断解决这种盲源分离(BSS)问题,但其算法仍然未知。先前关于生物学上可行的BSS算法的工作假设观察到的信号是统计独立或不相关的源的线性混合物,从而限制了这些算法的适用性域。为了克服这一局限性,我们提出了新型的生物学上的神经网络,以盲目地分离潜在的依赖/相关来源。与以前的工作不同,我们假设源向量的一般几何形状,而不是统计条件,允许分离潜在的依赖/相关源。具体而言,我们假设源矢量足够散布在其域中,可以用某些多面体描述。然后,我们考虑通过det-Max标准恢复这些源,这使输出相关矩阵的决定因素最大化,以实施类似的传播源估计值。从这个规范性原理开始,并使用加权相似性匹配方法,该方法可以通过本地学习规则适应任意线性转换,我们得出了两层覆盖生物学上可见的神经网络算法,这些神经网络算法可以将混合物分离为来自各种源域的来源。我们证明,我们的算法在相关的源分离问题上优于其他生物学上的BSS算法。
translated by 谷歌翻译
自我监督的学习允许AI系统使用不需要昂贵的标签的任务从大量数据中学习有效表示。模式崩溃,即为所有输入产生相同表示形式的模型,是许多自我监督学习方法的核心问题,可以使自我监督任务(例如匹配输入的变形变体)无效。在本文中,我们认为,同一输入的替代潜在表示之间信息最大化的直接应用自然解决了崩溃问题并实现了竞争性的经验结果。我们提出了一种自我监督的学习方法Corinfomax,该方法使用了基于二阶统计的共同信息度量,以反映其参数之间的相关性水平。在同一输入的替代表示之间最大化此相关信息度量有两个目的:(1)它通过生成具有非脱位协方差的特征向量来避免崩溃问题; (2)通过增加它们之间的线性依赖性,它在替代表示之间建立了相关性。提出的信息最大化客观的近似简化为基于欧几里得距离的目标函数,该目标函数由特征协方差矩阵的对数确定因素正规化。正则术语是针对特征空间退化的自然障碍。因此,除了避免完全输出崩溃到一个点外,提出的方法还通过鼓励信息在整个特征空间中的传播来防止尺寸崩溃。数值实验表明,相对于最先进的SSL方法,Corinfomax取得更好或竞争性的性能结果。
translated by 谷歌翻译
3D Flash LiDAR是传统扫描激光雷达系统的替代方法,有望在紧凑的外形尺寸中进行精确的深度成像,并且没有运动部件,例如自动驾驶汽车,机器人技术和增强现实(AR)等应用。通常在图像传感器格式中使用单光子,直接飞行时间(DTOF)接收器实施,设备的操作可能会受到需要在室外场景中处理和压缩的大量光子事件的阻碍以及对较大数组的可扩展性。我们在这里提出了一个64x32像素(256x128 spad)DTOF成像器,该成像器通过将像素与嵌入式直方图使用像素一起克服这些局限性,该直方直方图锁定并跟踪返回信号。这大大降低了输出数据帧的大小,可在10 kfps范围内或100 kfps的最大帧速率进行直接深度读数。该传感器可选择性地读数检测表面或传感运动的像素,从而减少功耗和片外处理要求。我们演示了传感器在中端激光雷达中的应用。
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.
translated by 谷歌翻译
Compliance in actuation has been exploited to generate highly dynamic maneuvers such as throwing that take advantage of the potential energy stored in joint springs. However, the energy storage and release could not be well-timed yet. On the contrary, for multi-link systems, the natural system dynamics might even work against the actual goal. With the introduction of variable stiffness actuators, this problem has been partially addressed. With a suitable optimal control strategy, the approximate decoupling of the motor from the link can be achieved to maximize the energy transfer into the distal link prior to launch. However, such continuous stiffness variation is complex and typically leads to oscillatory swing-up motions instead of clear launch sequences. To circumvent this issue, we investigate decoupling for speed maximization with a dedicated novel actuator concept denoted Bi-Stiffness Actuation. With this, it is possible to fully decouple the link from the joint mechanism by a switch-and-hold clutch and simultaneously keep the elastic energy stored. We show that with this novel paradigm, it is not only possible to reach the same optimal performance as with power-equivalent variable stiffness actuation, but even directly control the energy transfer timing. This is a major step forward compared to previous optimal control approaches, which rely on optimizing the full time-series control input.
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
In this work a novel recommender system (RS) for Tourism is presented. The RS is context aware as is now the rule in the state-of-the-art for recommender systems and works on top of a tourism ontology which is used to group the different items being offered. The presented RS mixes different types of recommenders creating an ensemble which changes on the basis of the RS's maturity. Starting from simple content-based recommendations and iteratively adding popularity, demographic and collaborative filtering methods as rating density and user cardinality increases. The result is a RS that mutates during its lifetime and uses a tourism ontology and natural language processing (NLP) to correctly bin the items to specific item categories and meta categories in the ontology. This item classification facilitates the association between user preferences and items, as well as allowing to better classify and group the items being offered, which in turn is particularly useful for context-aware filtering.
translated by 谷歌翻译
Neural compression offers a domain-agnostic approach to creating codecs for lossy or lossless compression via deep generative models. For sequence compression, however, most deep sequence models have costs that scale with the sequence length rather than the sequence complexity. In this work, we instead treat data sequences as observations from an underlying continuous-time process and learn how to efficiently discretize while retaining information about the full sequence. As a consequence of decoupling sequential information from its temporal discretization, our approach allows for greater compression rates and smaller computational complexity. Moreover, the continuous-time approach naturally allows us to decode at different time intervals. We empirically verify our approach on multiple domains involving compression of video and motion capture sequences, showing that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
translated by 谷歌翻译